영상이해특론 ECE 704

Fall 2023
Professor Sanghoon Sull
School of Electrical Engineering, Korea University


In this course, we will study the latest generative models (LLMs, open-source versions, finetuning, human feedback, reward model, image generation) as well as other hot AI topics.

Objective: Learning technical principles for the latest popular generative models in AI.

Class lectures: 수 (5-6)

Instructor: 설상훈, 공학관 404, 3290-3244, sull@korea.ac.kr

TA: 김재현, 공학관 438, 3290-3699, jhkim@mpeg.korea.ac.kr

Course materials: Selected papers from latest conferences and blogs

Prerequisite: machine learning, deep learning

Grading: midterm (40%), final (40%), Homework (10%), attendance (10%)

Reference:

  • Select papers and research blogs
  • Other machine learning books

Contents (tentative)

2023 0906 Introduction

2023 0913 Basics and Transformer

[L01-1] Attention Is All You Need, NIPS 2017 (Transformer)
[L01-1 ref1]  https://jalammar.github.io/illustrated-transformer/

[L01-2] Language Models are Few-Shot Learners, NeurIPS 2020 (GPT3)
[L01-2 ref2]  https://jalammar.github.io/illustrated-gpt2/

 

2023 0920 Reinforcement learning (RL), human feedback and Instruct GPT (1/2)

[L01-3]Training language models to follow instructions with human feedback, NeurIPS 2022  (InstructGPT)

[L01-3 ref1] Deep Reinforcement Learning from Human Preferences, NIPS 2017

[L01-3 ref2]: Fine-Tuning Language Models from Human Preferences, arXiv 2019

[L01-3 ref3] Proximal Policy Optimization Algorithms, arXiv 2017 (PPO)

Reference to RL: R. Sutton and A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 2018

 

2023 0927 Reinforcement learning (RL), human feedback and Instruct GPT (2/2) and LLAMA 2

L01-0 lecture note (intro2 transformer, LLM, RL and Instruct GPT) post v2.2.pdf

 

2023 1004 LLAMA 2, LLAMA 2-Chat

[L02-1] Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023 (Meta AI)

[L02-1 ref1] LLaMA: Open and Efficient Foundation Language Modelsl, 2023 (Meta AI)

[L02-1 ref2] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, 2023

L02-1 LLAMA 2 2023 2307.09288 post.pdf

 

2023 1011 LoRA

[L03] LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS, ICLR 2022

L03 LoRA-Low-Rank Adaptation of LLM ICLR 2022 2106.09685 post.pdf

 

2023 1018 MEMIT

[L04-1] MEMIT: MASS-EDITING MEMORY IN A TRANSFORMER, ICLR 2023

[L04-1  ref1]  ROME: Locating and Editing Factual Associations in GPT, NeurIPS 2022

[L04-1 ref2] Transformer Feed-Forward Layers Are Key-Value Memories, 2021

L04 MEMIT Mass-editing memory T ICLR 2023 2210.07229 post.pdf (with additional comments in red and green for Figure 3)

 

2023 1025 Midterm exam

 

2023 1101 UniAD

[L05-1] UniAD: Planning-oriented Autonomous Driving, CVPR 2023

[L05-1 ref1]  DETR: End-to-end object detection with transformers, ECCV, 2020

[L05-1 ref2] TrackFormer: Multi-Object Tracking with Transformers, CVPR 2022

[L05-1 ref3] BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers, ECCV 2022

L05-1 Planning-oriented Autonomous Driving cvpr 2023 2212.10156 post.pdf

L05-0 Supple (query) short version post.pdf

 

2023 1108 Fundamentals: Training, Sampling, Acceleration, Guidance (1/2)

[L06-1] Denoising Diffusion Models: A Generative Learning Big Bang, CVPR 2023 (Tutorial)

[L06-1 ref1] Deep Unsupervised Learning using Nonequilibrium Thermodynamics, ICML 2015

[L06-1 ref2] Denoising-diffusion-probabilistic-models (DDPM), NeurIPS 2020

 

2023 1115 Fundamentals: Training, Sampling, Acceleration, Guidance (2/2)

[L06-1 ref3] SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS, ICLR 2021

Reference
Simo Sarkka and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019 (Def. 3.9 (White noise), Def. 4.1 (Brownian motion)

L06-1 cvpr2023-diffusion-tutorial-part-1 post.pdf

 

2023 1122 Stable Diffusion (1/2)

[L07-1] High-Resolution Image Synthesis with Latent Diffusion Models,  CVPR 2022

[L07-1 ref1] Taming Transformers for High-Resolution Image Synthesis, CVPR 2021

[L07-1 ref5] Auto-Encoding Variational Bayes (VAE), 2013 1312.6114

[L07-1 ref7]  VQ-VAE neural-discrete-representation-learning, NIPS 2017

[L07-1 ref12]  Alignments in Text-to-Image Generation (Tutorial on Vision Foundation Models), CVPR 2023

 

2023 1129 Stable diffusion (2/2), DDIM

[L07-2-ref2] DENOISING DIFFUSION IMPLICIT MODELS (DDIM), ICLR 2021

L07-1 High-Resolution Image Synthesis with Latent Diffusion Models CVPR 2022 2112.10752 post.pdf

 

2023 1206 Distillation of Guided Diffusion Models

[L07-2] On Distillation of Guided Diffusion Models, CVPR 2023

[L07-2-ref1] Classifier free diffusion guidance NeurIPS wkshop 2021

[L07-2-ref2] Progressive distillation, ICLR 2022

L06-1 cvpr2023-diffusion-tutorial-part-1 modified post.pdf

 

2023 1213 Final exam

 

2023 1220 Q/A